A spectrogram model for enhanced source localization and noise-robust ASR
نویسندگان
چکیده
This paper proposes a simple, computationally efficient 2-mixture model approach to discrimination between speech and background noise. It is directly derived from observations on real data, and can be used in a fully unsupervised manner, with the EM algorithm. A first application to sector-based, joint audio source localization and detection, using multiple microphones, confirms that the model can provide major enhancement. A second application to the single channel speech recognition task in a noisy environment yields major improvement on stationary noise and promising results on non-stationary noise.
منابع مشابه
Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR
The performance of an automatic speech recognition (ASR) system degrades severely in noisy and reverberant environments in part due to the lack of robustness in the underlying representations used in the ASR system. On the other hand, the auditory processing studies have shown the importance of modulation filtered spectrogram representations in robust human speech recognition. Inspired by these...
متن کاملMultichannel Spatial Clustering for Robust Far-Field Automatic Speech Recognition in Mismatched Conditions
Recent automatic speech recognition (ASR) results are quite good when the training data is matched to the test data, but much worse when they differ in some important regard, like the number and arrangement of microphones or differences in reverberation and noise conditions. This paper proposes an unsupervised spatial clustering approach to microphone array processing that can overcome such tra...
متن کاملRobust speech recognition using the modulation spectrogram
The performance of present-day automatic speech recognition (ASR) systems is seriously compromised by levels of acoustic interference (such as additive noise and room reverberation) representative of real-world speaking conditions. Studies on the perception of speech by human listeners suggest that recognizer robustness might be improved by focusing on temporal structure in the speech signal th...
متن کاملMorphological filtering of speech spectrograms in the context of additive noise
A recent approach to signal segmentation in additive noise [1, 2] uses features of small spectrogram sub-units accrued over the full spectrogram. The original work considered chirp signals in additive white Gaussian noise. This paper extends this work first by considering similar signals at different signal-to-noise ratios and then in the context of speech recognition. For the chirp case, a cos...
متن کاملSpeech Enhancement via Combination of Wiener Filter and Blind Source Separation
Automatic speech recognition (ASR) often fails in acoustically noisy environments. Aimed to improve speech recognition scores of an ASR in a real-life like acoustical environment, a speech pre-processing system is proposed in this paper, which consists of several stages: First, a convolutive blind source separation (BSS) is applied to the spectrogram of the signals that are pre-processed by bin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005